AITopics

2511.21689

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsJan-25-2025, 00:50:02 GMT

Review for NeurIPS paper: Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation

I am not sure what the Green cross, diamond etc indicate, are those distilled models, and from which automl system were they obtained? Moreover, I am rather skeptical seeing only the mean. I would have loved to understand where your methods is significantly better and when does it fail, like a best-case, worst-case, average-case analysis. Reporting the mean alone can be misleading. In Section 3.1 (Maximum Pseudo-likelihood Estimation) Tabular data typically contains numerical, categorical, and text-based data.

augmented distillation, distillation, section 3, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.53)

Abdellatif, Mohamed Hisham

(WhyPHI) Fine-Tuning PHI-3 for Multiple-Choice Question Answering: Methodology, Results, and Challenges

arXiv.org Artificial IntelligenceJan-2-2025

Large Language Models (LLMs) have become essential tools across various domains due to their impressive capabilities in understanding and generating human-like text. The ability to accurately answer multiple-choice questions (MCQs) holds significant value in education, particularly in automated tutoring systems and assessment platforms. However, adapting LLMs to handle MCQ tasks effectively remains challenging due to the hallucinations and unclear prompts. This work explores the potential of Microsoft's PHI-3\cite{Abdin2024}, a compact yet efficient LLM, for MCQ answering. Our contributions include fine-tuning the model on the TruthfulQA dataset, designing optimized prompts to enhance model performance, and evaluating using perplexity and traditional metrics like accuracy and F1 score. Results show a remarkable improvement in PHI-3.5's MCQ handling post-fine-tuning, with perplexity decreasing from 4.68 to 2.27, and accuracy rising from 62\% to 90.8\%. This research underlines the importance of efficient models in adaptive learning systems and educational assessments, paving the way for broader integration into the classroom, particularly in fields like test preparation, student feedback, and personalized learning.

artificial intelligence, large language model, natural language, (20 more...)

2501.01588

Country: Africa > Middle East > Egypt (0.14)

Genre: Research Report > New Finding (0.67)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.54)
Education > Assessment & Standards (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

arXiv.org Artificial IntelligenceOct-13-2024

A resource-efficient model for deep kernel learning

D'Amore, Luisa

According to the Hughes phenomenon, the major challenges encountered in computations with learning models comes from the scale of complexity, e.g. the so-called curse of dimensionality. There are various approaches for accelerate learning computations with minimal loss of accuracy. These approaches range from model-level to implementation-level approaches. To the best of our knowledge, the first one is rarely used in its basic form. Perhaps, this is due to theoretical understanding of mathematical insights of model decomposition approaches, and thus the ability of developing mathematical improvements has lagged behind. We describe a model-level decomposition approach that combines both the decomposition of the operators and the decomposition of the network. We perform a feasibility analysis on the resulting algorithm, both in terms of its accuracy and scalability.

artificial intelligence, data mining, machine learning, (19 more...)

2410.09926

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > New York > Nassau County > Mineola (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(6 more...)

Genre: Research Report (0.50)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science > Data Mining (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Tillmann, Christoph, Trivedi, Aashka, Bhattacharjee, Bishwaranjan

Efficient Models for the Detection of Hate, Abuse and Profanity

arXiv.org Artificial IntelligenceFeb-8-2024

Large Language Models (LLMs) are the cornerstone for many Natural Language Processing (NLP) tasks like sentiment analysis, document classification, named entity recognition, question answering, summarization, etc. LLMs are often trained on data which originates from the web. This data is prone to having content with Hate, Abuse and Profanity (HAP). For a detailed definition of HAP, please refer to the Appendix. Due to the LLMs being exposed to HAP content during training, the models learn it and may then generate hateful or profane content. For example, when the open-source RoBERTa model (specifically, the RoBERTA base model) from the HuggingFace (HF) Transformers library is prompted to replace the mask token in `I do not know that Persian people are that MASK` it returns the word `stupid` with the highest score. This is unacceptable in civil discourse.The detection of Hate, Abuse and Profanity in text is a vital component of creating civil and unbiased LLMs, which is needed not only for English, but for all languages. In this article, we briefly describe the creation of HAP detectors and various ways of using them to make models civil and acceptable in the output they generate.

architecture, hap score, hypothesis, (14 more...)

2402.05624

Country:

North America > Canada > Ontario > Toronto (0.04)
Asia > Singapore (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report (0.40)

Industry: Information Technology (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceMay-10-2023

QuaLA-MiniLM: a Quantized Length Adaptive MiniLM

Guskin, Shira, Wasserblat, Moshe, Wang, Chang, Shen, Haihao

Limited computational budgets often prevent transformers from being used in production and from having their high accuracy utilized. A knowledge distillation approach addresses the computational efficiency by self-distilling BERT into a smaller transformer representation having fewer layers and smaller internal embedding. However, the performance of these models drops as we reduce the number of layers, notably in advanced NLP tasks such as span question answering. In addition, a separate model must be trained for each inference scenario with its distinct computational budget. Dynamic-TinyBERT tackles both limitations by partially implementing the Length Adaptive Transformer (LAT) technique onto TinyBERT, achieving x3 speedup over BERT-base with minimal accuracy loss. In this work, we expand the Dynamic-TinyBERT approach to generate a much more highly efficient model. We use MiniLM distillation jointly with the LAT method, and we further enhance the efficiency by applying low-bit quantization. Our quantized length-adaptive MiniLM model (QuaLA-MiniLM) is trained only once, dynamically fits any inference scenario, and achieves an accuracy-efficiency trade-off superior to any other efficient approaches per any computational budget on the SQuAD1.1 dataset (up to x8.8 speedup with <1% accuracy loss). The code to reproduce this work is publicly available on Github.

artificial intelligence, machine learning, natural language, (18 more...)

2210.17114

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

#artificialintelligenceDec-11-2022, 09:35:14 GMT

Transformers for Multi-Regression -- [PART1] – Towards AI

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider...

artificial intelligence, machine learning, transformer, (16 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

#artificialintelligenceAug-30-2020, 14:45:32 GMT

How Does PCA Dimension Reduction Work For Images?

In machine learning, we need lots of data to build an efficient model, but dealing with a larger dataset is not an easy task we need to work hard in preprocessing the data and as a data scientist we will come across a situation dealing with a large number of variables here PCA (principal component analysis) is dimension reduction technique helps in dealing with those problems. In this article, we will demonstrate how to work on larger data and images using a famous dimension reduction technique PCA( principal component analysis). PCA is a dimensionality reduction that is often used to reduce the dimension of the variables of a larger dataset that is compressed to the smaller one which contains most of the information to build an efficient model. In a real-time scenario when you are working reducing the number of variables in the dataset you need compromise on model accuracy but using PCA will give good accuracy. The idea of PCA is to reduce the variables in the dataset and preserve data as much as possible.

artificial intelligence, machine learning, principal component analysis, (9 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.48)

#artificialintelligenceNov-8-2019, 01:51:38 GMT

Pattern Recognition : How is it different from Machine Learning Edureka

Pattern Recognition is one of the key features that govern any AI or ML project. The industry of Machine Learning is surely booming and in a good direction. In today's world, a lot of different type of data is flowing across systems in order to categorize the data we cannot use traditional programming which has rules that can check some conditions and classify data. The solution to this problem is Machine Learning, with the help of it we can create a model which can classify different patterns from data. One of the applications of this is the classification of spam or non-spam data.

artificial intelligence, machine learning, pattern recognition, (11 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)

#artificialintelligenceOct-5-2019, 18:09:04 GMT

Machine Learning Helps Create Detailed, Efficient Models of Water

How water acts affects everything from storm clouds to ice sheets. Computer scientists want to model water's various properties. Accurate and computationally efficient molecular-level descriptions of large samples of ice-water systems are difficult to build. The numerous molecules and various timescales remain a challenge despite advances in computing hardware. Now, a team developed machine-learning–based water models that correctly predict water's key features, such as the melting point of ice.

argonne leadership computing facility, efficient model, water model, (6 more...)

Industry: Energy (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.81)